A Quick Introduction to Version Control with Git and GitHub
نویسندگان
چکیده
Many scientists write code as part of their research. Just as experiments are logged in laboratory notebooks, it is important to document the code you use for analysis. However, a few key problems can arise when iteratively developing code that make it difficult to document and track which code version was used to create each result. First, you often need to experiment with new ideas, such as adding new features to a script or increasing the speed of a slow step, but you do not want to risk breaking the currently working code. One often-utilized solution is to make a copy of the script before making new edits. However, this can quickly become a problem because it clutters your file system with uninformative filenames, e.g., analysis.sh, analysis_02.sh, analysis_03.sh, etc. It is difficult to remember the differences between the versions of the files and, more importantly, which version you used to produce specific results, especially if you return to the code months later. Second, you will likely share your code with multiple lab mates or collaborators, and they may have suggestions on how to improve it. If you email the code to multiple people, you will have to manually incorporate all the changes each of them sends. Fortunately, software engineers have already developed software to manage these issues: version control. A version control system (VCS) allows you to track the iterative changes you make to your code. Thus, you can experiment with new ideas but always have the option to revert to a specific past version of the code you used to generate particular results. Furthermore, you can record messages as you save each successive version so that you (or anyone else) reviewing the development history of the code is able to understand the rationale for the given edits. It also facilitates collaboration. Using a VCS, your collaborators can make and save changes to the code, and you can automatically incorporate these changes to the main code base. The collaborative aspect is enhanced with the emergence of websites that host versioncontrolled code. In this quick guide, we introduce you to one VCS, Git (https://git-scm.com), and one online hosting site, GitHub (https://github.com), both of which are currently popular among scientists and programmers in general. More importantly, we hope to convince you that although mastering a given VCS takes time, you can already achieve great benefits by getting started using a few simple commands. Furthermore, not only does using a VCS solve many common problems when writing code, it can also improve the scientific process. By tracking your code
منابع مشابه
Summarizing Git Commits and GitHub Pull Requests Using Sequence to Sequence Neural Attention Models
Every day millions of developers and programmers push commits to GitHub to ensure their projects are version controlled, reproducible, and remotely accessible. There are nearly 20 million public repositories (collections of source code in the form of projects) on GitHub today, and over 16 million unique users. Users are able to commit additions or changes to their own repositories, as well as t...
متن کاملVersion Control with Git - Powerful Tools and Techniques for Collaborative Software Development: Covers GitHub, Second Edition
It sounds good when knowing the version control with git powerful tools and techniques for collaborative software development in this website. This is one of the books that many people looking for. In the past, many people ask about this book as their favourite book to read and collect. And now, we present hat you need quickly. It seems to be so happy to offer you this famous book. It will not ...
متن کاملSPARQL2Git: Transparent SPARQL and Linked Data API Curation via Git
In this demo, we show how an effective and application agnostic way of curating SPARQL queries can be achieved by leveraging Git-based architectures. Often, SPARQL queries are hard-coded into Linked Data consuming applications. This tight coupling poses issues in code maintainability, since these queries are prone to change to adapt to new situations; and query reuse, since queries that might b...
متن کاملAnonymized e-mail interviews with R package maintainers active on CRAN and GitHub
This technical report accompanies the research article [1] that empirically studies the problems related to interrepository package dependencies in the R ecosystem of statistical computing, with a focus on R packages hosted on CRAN and GitHub. That article extends our earlier research on the R package ecosystem, published in [2]–[4]. The current report provides supplementary material, reproduci...
متن کاملCombined Methods, Thick Descriptions: Languages of Collaboration on Github
Like many professional work activities in this age of ubiquitous computing and high-speed internet connections, computer programming and software development are increasingly mediated by systems with ‘social media’ features like profiles, avatars, ‘liking’, and commenting capabilities. When working on shared tasks, programmers have effectively leveraged these capabilities to overcome difference...
متن کامل